TextGrid : Optical Character Recognition (OCR)

The OCR workflows allow you to translate scanned images of printed “Fraktur” (blackletter) and “Antiqua” texts into machine-encoded text. The hereby generated results enable the user to edit the text in TextGrid, browse it or to do text mining.

If there is no Project yet, a new one has to be created. After refreshing the Navigator, the created Project gets displayed there and has to be selected to import images to it. Next, one opens the „File“ menu and selects „Import local files“ to import the images there. After refreshing the Navigator again it displays the imported images. Open the workflow. On the right appear the views „Input Document for Workflow“, „Workflow Results“, „Workflow Selection“ and „Job Management“ which will be relevant during the next steps.

First a new workflow is needed. It can be created in the Workflow Section. OCRopus Fractur or Modern can be choosen as services and afterwards the default values are kept. In the last window the workflow can be named and assigned to a specific Project.

After refreshing the Workflow Section, the newly created workflow becomes visible. After double clicking it, an empty list is displayed in the Workflow Preparation View. The images to execute can simply be dragged there from the Project on the left.

Now the workflow gets selected in the Workflow Selection View and gets assigned a „Target project“. Clicking „Run“ will now start the image recognition and while it is executing the status is displayed in the Job Management View.

After finishing the result is displayed in the Workflow Results View and can be opened with right-click „Open with > Text editor“.

Optical Character Recognition

Attachments: